714 research outputs found

    Spectral gene set enrichment (SGSE)

    Get PDF
    Motivation: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. Results: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracey-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Availability: http://cran.r-project.org/web/packages/PCGSE/index.html Contact: [email protected] or [email protected]

    Principal component gene set enrichment (PCGSE)

    Get PDF
    Motivation: Although principal component analysis (PCA) is widely used for the dimensional reduction of biomedical data, interpretation of PCA results remains daunting. Most existing methods attempt to explain each principal component (PC) in terms of a small number of variables by generating approximate PCs with few non-zero loadings. Although useful when just a few variables dominate the population PCs, these methods are often inadequate for characterizing the PCs of high-dimensional genomic data. For genomic data, reproducible and biologically meaningful PC interpretation requires methods based on the combined signal of functionally related sets of genes. While gene set testing methods have been widely used in supervised settings to quantify the association of groups of genes with clinical outcomes, these methods have seen only limited application for testing the enrichment of gene sets relative to sample PCs. Results: We describe a novel approach, principal component gene set enrichment (PCGSE), for computing the statistical association between gene sets and the PCs of genomic data. The PCGSE method performs a two-stage competitive gene set test using the correlation between each gene and each PC as the gene-level test statistic with flexible choice of both the gene set test statistic and the method used to compute the null distribution of the gene set statistic. Using simulated data with simulated gene sets and real gene expression data with curated gene sets, we demonstrate that biologically meaningful and computationally efficient results can be obtained from a simple parametric version of the PCGSE method that performs a correlation-adjusted two-sample t-test between the gene-level test statistics for gene set members and genes not in the set. Availability: http://cran.r-project.org/web/packages/PCGSE/index.html Contact: [email protected] or [email protected]

    Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA)

    Full text link
    We present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the recently detailed formula for computing normed, squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. Relative to the state-of-the-art LASSO-based sparse PCA method of Witten, Tibshirani and Hastie, the EESPCA technique offers a two-orders-of-magnitude improvement in computational speed, does not require estimation of tuning parameters, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, EESPCA achieves these performance benefits while maintaining a reconstruction error close to that generated by the Witten et al. approach. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding problems such as the analysis of large data matrices or statistical techniques like resampling that involve the repeated application of sparse PCA

    Sir William Macdonald: an Unfinished Portrait

    Get PDF

    Principal Component Gene Set Enrichment (Pcgse)

    Get PDF
    Background: Although principal component analysis (PCA) is widely used for the dimensional reduction of biomedical data, interpretation of PCA results remains daunting. Most existing interpretation methods attempt to explain each principal component (PC) in terms of a small number of variables by generating approximate PCs with mainly zero loadings. Although useful when just a few variables dominate the population PCs, these methods can perform poorly on genomic data, where interesting biological features are frequently represented by the combined signal of functionally related sets of genes. While gene set testing methods have been widely used in supervised settings to quantify the association of groups of genes with clinical outcomes, these methods have seen only limited application for testing the enrichment of gene sets relative to sample PCs. Results: We describe a novel approach, principal component gene set enrichment (PCGSE), for unsupervised gene set testing relative to the sample PCs of genomic data. The PCGSE method computes the statistical association between gene sets and individual PCs using a two-stage competitive gene set test. To demonstrate the efficacy of the PCGSE method, we use simulated and real gene expression data to evaluate the performance of various gene set test statistics and significance tests. Conclusions: Gene set testing is an effective approach for interpreting the PCs of high-dimensional genomic data. As shown using both simulated and real datasets, the PCGSE method can generate biologically meaningful and computationally efficient results via a two-stage, competitive parametric test that correctly accounts for inter-gene correlation

    Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction

    Get PDF
    Background: Over the past decades, approaches for diagnosing and treating cancer have seen significant improvement. However, the variability of patient and tumor characteristics has limited progress on methods for prognosis prediction. The development of high-throughput omics technologies now provides multiple approaches for characterizing tumors. Although a large number of published studies have focused on integration of multi-omics data and use of pathway-level models for cancer prognosis prediction, there still exists a gap of knowledge regarding the prognostic landscape across multi-omics data for multiple cancer types using both gene-level and pathway-level predictors. Methods: In this study, we systematically evaluated three often available types of omics data (gene expression, copy number variation and somatic point mutation) covering both DNA-level and RNA-level features. We evaluated the landscape of predictive performance of these three omics modalities for 33 cancer types in the TCGA using a Lasso or Group Lasso-penalized Cox model and either gene or pathway level predictors. Results: We constructed the prognostic landscape using three types of omics data for 33 cancer types on both the gene and pathway levels. Based on this landscape, we found that predictive performance is cancer type dependent and we also highlighted the cancer types and omics modalities that support the most accurate prognostic models. In general, models estimated on gene expression data provide the best predictive performance on either gene or pathway level and adding copy number variation or somatic point mutation data to gene expression data does not improve predictive performance, with some exceptional cohorts including low grade glioma and thyroid cancer. In general, pathway-level models have better interpretative performance, higher stability and smaller model size across multiple cancer types and omics data types relative to gene-level models. Conclusions: Based on this landscape and comprehensively comparison, models estimated on gene expression data provide the best predictive performance on either gene or pathway level. Pathway-level models have better interpretative performance, higher stability and smaller model size relative to gene-level models

    Understanding violence through social media

    Get PDF
    While social media analysis has been widely utilized to predict various market and political trends, its utilization to improve geospatial conflict prediction in contested environments remains understudied. To determine the feasibility of social media utilization in conflict prediction, we compared historical conflict data and social media metadata, utilizing over 829,537 geo-referenced messages sent through the Twitter network within Iraq from August 2013 to July 2014. From our research, we conclude that social media metadata has a positive impact on conflict prediction when compared with historical conflict data. Additionally, we find that utilizing the most extreme negative terminology from a locally derived social media lexicon provided the most significant predictive accuracy for determining areas that would experience subsequent violence. We suggest future research projects center on improving the conflict prediction capability of social media data and include social media analysis in operational assessments.http://archive.org/details/understandingvio1094556920Major, United States ArmyLieutenant Commander, United States NavyApproved for public release; distribution is unlimited

    Differential effects of fenofibrate versus atorvastatin on the concentrations of E-selectin and vascular cellular adhesion molecule-1 in patients with type 2 diabetes mellitus and mixed hyperlipoproteinemia: a randomized cross-over trial

    Get PDF
    BACKGROUND: Diabetic dyslipoproteinemia is characterized by hypertriglyceridemia, low HDL-cholesterol and often elevated LDL-cholesterol and is a strong risk factor for atherosclerosis. Adhesion molecule levels are elevated both in hyperlipoproteinemia and diabetes mellitus. It is unclear whether fibrate or statin therapy has more beneficial effects on adhesion molecule concentrations. METHODS: Atorvastatin (10 mg/d) was compared to fenofibrate (200 mg/d) each for 6 weeks separated by a 6 week washout period in 11 patients (6 male, 5 female; 61.8 ± 8.2 years; body mass index 29.8 ± 3.1 kg/m(2)) with type 2 diabetes mellitus (HbA(1c )7.3 ± 1.1 %) and mixed hyperlipoproteinemia using a randomized, cross-over design. Fasting blood glucose, HbA(1)c, lipid parameters, E-selectin, ICAM-1, VCAM-1, and fibrinogen concentrations were determined before and after each drug. RESULTS: Glucose and HbA(1)c concentrations remained unchanged during the whole study period. LDL cholesterol was reduced during atorvastatin therapy, triglycerides were lowered more effectively with fenofibrate. Comparison of pre- and postreatment concentrations of E-selectin showed a reduction during atorvastatin (-7 %, p = 0.11) and fenofibrate (-10 %, p < 0.05) therapy. Atorvastatin treatment reduced VCAM-1 levels by 4% (p < 0.05), while VCAM-1 concentrations remained unchanged (+1%, ns) during fenofibate therapy. However, direct comparisons of post-treatment levels during both forms of therapy were not of statistical significance. ICAM-1 levels were not influenced by either form of therapy. CONCLUSIONS: In addition to the different beneficial effects on lipid metabolism, both drugs appear to lower adhesion molecule plasma concentrations in a different manner in patients with type 2 diabetes and mixed hyperlipoproteinemia. Our observations should be confirmed in a larger cohort of such patients

    A Dietary-Wide Association Study (DWAS) of Environmental Metal Exposure in US Children and Adults

    Get PDF
    Background: A growing body of evidence suggests that exposure to toxic metals occurs through diet but few studies have comprehensively examined dietary sources of exposure in US populations. Purpose: Our goal was to perform a novel dietary-wide association study (DWAS) to identify specific dietary sources of lead, cadmium, mercury, and arsenic exposure in US children and adults. Methods: We combined data from the National Health and Nutrition Examination Survey with data from the US Department of Agriculture’s Food Intakes Converted to Retail Commodities Database to examine associations between 49 different foods and environmental metal exposure. Using blood and urinary biomarkers for lead, cadmium, mercury, and arsenic, we compared sources of dietary exposure among children to that of adults. Results: Diet accounted for more of the variation in mercury and arsenic than lead and cadmium. For instance we estimate 4.5% of the variation of mercury among children and 10.5% among adults is explained by diet. We identified a previously unrecognized association between rice consumption and mercury in a US study population – adjusted for other dietary sources such as seafood, an increase of 10 g/day of rice consumption was associated with a 4.8% (95% CI: 3.6, 5.2) increase in blood mercury concentration. Associations between diet and metal exposure were similar among children and adults, and we recapitulated other known dietary sources of exposure. Conclusion: Utilizing this combination of data sources, this approach has the potential to identify and monitor dietary sources of metal exposure in the US population

    Gene Ontology Analysis of Pairwise Genetic Associations in Two Genome-Wide Studies of Sporadic ALS

    Get PDF
    It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO)
    • …
    corecore